Search CORE

25 research outputs found

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Decoupled Vector-Fetch Architecture with a Scalarizing Compiler

Author: Lee Yunsup
Publication venue: 'California Digital Library (CDL)'
Publication date: 01/01/2016
Field of study

As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others. In this thesis, I present the Hwacha decoupled vector-fetch architecture as the basis of a new data-parallel machine. I reason through the design decisions while describing its programming model, microarchitecture, and LLVM-based scalarizing compiler that efficiently maps OpenCL kernels to the architecture. The Hwacha vector unit is implemented in Chisel as an accelerator attached to a RISC-V Rocket control processor within the open-source Rocket Chip SoC generator. Using complete VLSI implementations of Hwacha, including a cache-coherent memory hierarchy in a commercial 28 nm process and simulated LPDDR3 DRAM modules, I quantify the area, performance, and energy consumption of the Hwacha accelerator. These numbers are then validated against an ARM Mali-T628 MP6 GPU, also built in a 28 nm process, using a set of OpenCL microbenchmarks compiled from the same source code with our custom compiler and ARM's stock OpenCL compiler

ProQuest OAI Repository

Decoupled Vector-Fetch Architecture with a Scalarizing Compiler

Author: Lee Yunsup
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others.In this thesis, I present the Hwacha decoupled vector-fetch architecture as the basis of a new data-parallel machine. I reason through the design decisions while describing its programming model, microarchitecture, and LLVM-based scalarizing compiler that efficiently maps OpenCL kernels to the architecture. The Hwacha vector unit is implemented in Chisel as an accelerator attached to a RISC-V Rocket control processor within the open-source Rocket Chip SoC generator. Using complete VLSI implementations of Hwacha, including a cache-coherent memory hierarchy in a commercial 28 nm process and simulated LPDDR3 DRAM modules, I quantify the area, performance, and energy consumption of the Hwacha accelerator. These numbers are then validated against an ARM Mali-T628 MP6 GPU, also built in a 28 nm process, using a set of OpenCL microbenchmarks compiled from the same source code with our custom compiler and ARM's stock OpenCL compiler

Ezid

eScholarship - University of California

ProQuest OAI Repository

Regulation of lubricin for functional cartilage tissue regeneration: a review

Author: Jaehoon Choi
Nathaniel S. Hwang
Yunsup Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

Abstract Background Lubricin is chondrocyte-secreted glycoprotein that primarily conducts boundary lubrication between joint surfaces. Besides its cytoprotective function and extracellular matrix (ECM) attachment, lubricin is recommended as a novel biotherapeutic protein that restore functional articular cartilage. Likewise, malfunction of lubrication in damaged articular cartilage caused by complex and multifaceted matter is a major concern in the field of cartilage tissue engineering. Main body Although a noticeable progress has been made toward cartilage tissue regeneration through numerous approaches such as autologous chondrocyte implantation, osteochondral grafts, and microfracture technique, the functionality of engineered cartilage is a challenge for complete reconstruction of cartilage. Thus, delicate modulation of lubricin along with cell/scaffold application will expand the research on cartilage tissue engineering. Conclusion In this review, we will discuss the empirical analysis of lubricin from fundamental interpretation to the practical design of gene expression regulation

SNU Open Repository and Archive

Directory of Open Access Journals

Convergence and Scalarization for Data-Parallel Architectures

Author: Krste Asanović
Ronny Krashinsky
Stephen W. Keckler
Vinod Grover
Yunsup Lee
Publication venue
Publication date: 01/01/2013
Field of study

Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One drawback of this approach compared to conventional vector architectures is redundant execution of instructions that are common across multiple threads, resulting in energy inefficiency due to excess instruction dispatch, register file accesses, and memory operations. This paper proposes to alleviate these overheads while retaining the threaded programming model by automatically detecting the scalar operations and factoring them out of the parallel code. We have developed a scalarizing compiler that employs convergence and variance analyses to statically identify values and instructions that are invariant across multiple threads. Our compiler algorithms are effective at identifying convergent execution even in programs with arbitrary control flow, identifying two-thirds of the opportunity captured by a dynamic oracle. The compile-time analysis leads to a reduction in instructions dispatched by 29%, register file reads and writes by 31%, memory address counts by 47%, and data access counts by 38%

CiteSeerX

Crossref

Biomimetically Reinforced Polyvinyl Alcohol-Based Hybrid Scaffolds for Cartilage Tissue Engineering

Author: Hwan Kim
Nathaniel Hwang
O'Brien
Yongsung Hwang
Yunhye Kim
Yunsup Lee
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

Efficient, high-quality image contour detection

Author: Bor-yiing Su
Bryan Catanzaro
Kurt Keutzer
Mark Murphy
Narayanan Sundaram
Yunsup Lee
Publication venue
Publication date: 01/01/2009
Field of study

Image contour detection is fundamental to many image analysis applications, including image segmentation, object recognition and classification. However, highly accurate image contour detection algorithms are also very computationally intensive, which limits their applicability, even for offline batch processing. In this work, we examine efficient parallel algorithms for performing image contour detection, with particular attention paid to local image analysis as well as the generalized eigensolver used in Normalized Cuts. Combining these algorithms into a contour detector, along with careful implementation on highly parallel, commodity processors from Nvidia, our contour detector provides uncompromised contour accuracy, with an F-metric of 0.70 on the Berkeley Segmentation Dataset. Runtime is reduced from 4 minutes to 1.8 seconds. The efficiency gains we realize enable high-quality image contour detection on much larger images than previously practical, and the algorithms we propose are applicable to several image segmentation approaches. Efficient, scalable, yet highly accurate image contour detection will facilitate increased performance in many computer vision applications. 1

CiteSeerX

Crossref

Experimental assessment of the effect of frozen fringe thickness on frost heave

Author: Jang Young-Eun
Jin Hyun Woo
Lee Jangguen
Ryu Byun Hyun
Shin Yunsup
Publication venue: 'Techno-Press'
Publication date: 01/10/2019
Field of study

A frozen fringe plays a key role in frost heave development in soils. Previous studies have focused on the physical and mechanical properties of the frozen fringe, such as overall hydraulic conductivity, water content and pore pressure. It has been proposed that the thickness of the frozen fringe controls frost heave behavior, but this effect has not been thoroughly evaluated. This study used a temperature-controllable cell to investigate the impact of frozen fringe thickness on the characteristics of frost heave. A series of laboratory tests was performed with various temperature boundary conditions and specimen heights, revealing that: (1) the amount and rate of development of frost heave are dependent on the frozen fringe thickness; (2) the thicker the frozen fringe, the thinner the resulting ice lens; and (3) care must be taken when using the frost heave ratio to characterize frost heave and evaluate frost susceptibility because the frost heave ratio is not a normalized factor but a specimen height-dependent factor

ScholarWorks@UNIST